Learning to Rank at Reddit : A Project Retro

Doug Turnbull and Chris Fournier • Location: Theater 7 • Back to Haystack 2024

In today’s AI based world, Reddit stands out as a deep catalog of human, subjective information. Whether product reviews or the deeply personal - Reddit searchers want to connect with other humans, not generic AI based answers.

We at Reddit would like the site-search experience to be better, so you don’t need to add “Reddit” to your Google search. That’s what we’re trying to do with Learning to Rank: turning relevance into a repeatable, data-driven solution.

The journey hasn’t been an easy one. We want to share our painful lessons learned working with training data, developing features, the Solr Learning to Rank plugin, scaling Learning to Rank to 1000s of QPS, and more. Hopefully, you can learn from the egg we constantly found on our faces!

See how our scrappy team has been slowly turning LTR from a science project into a repeatable process of constant, data-informed improvement. From a lab to an assembly line, come and learn from our painful lessons big and small.

Download the Slides Watch the Video

Doug Turnbull

Doug Turnbull has been enthusiastic about search relevance since 2013. He co-authored Relevant Search and AI Powered Search. He created Quepid and Splainer for search relevance testing. He co-created the Elasticsearch Learning to Rank plugin with Wikimedia Foundation and Snagajob. Doug loves learning from other search practitioners, and hopes you'll bring inquisitive curiosity and experiences to this talk. Doug currently works at Reddit where he's helping bring Machine Learning to search. Recently Doug worked at Shopify to help improve merchant search attributed revenue by 19% year over year.. Doug spent 8 years consulting at dozens of organizations during his time as CTO at OpenSource Connections. Doug blogs about search and other topics at http://softwaredoug.com

Chris Fournier

Chris joined Reddit only 6 months ago to work on their search infrastructure, but has been on previous search infrastructure and data warehousing teams enabling queries of all sorts for the past 12 years at companies including Shopify. Originally a machine learning researcher specializing in evaluation metrics, Chris kept going down the software stack and enjoys being able to work at every backend level from debugging query relevance, implementing/tuning various Python/Java search services (Elasticsearch, Solr, Spark), defining infra with Kubernetes, to designing and performing experiments determining optimal search engine configurations.

Cliff Chen

Cliff has been at Reddit since 2017, originally working on backend API services before moving to home feed experimentation and eventually Search. Since joining the Search team 5 years ago, Cliff has worked on virtually every facet of Reddit Search, from calculating Trends and migrating the streaming indexing backend to Scala/Flink, to Spellcheck and bootstrapping the largest Solr cluster at Reddit for Comments. Currently Cliff is focused on productionalizing LTR and figuring out how to prevent spellcheck from correcting "stock" to "stonk".